library(jpeg)
library(ppcor)
library(igraph)
library(shiny)

Purpose & Statistical tools used

The goal of the project is to analyze differences in fMRI (functional magnetic resonance imaging) between subjects with autism spectrum disorder (ASD) and Typically Developed (TD) subjects in order to highlight the possible difference that can be analyzed with more focus on medical study. In order to reach this goal we used statistical tools as Hypothesis test based on asymptotic confidence intervals for the Pearson’s correlation coefficient \(\rho\) and partial correlation coefficient \(\rho^{(P)}\). Differences between the two approaches will be shown later.

Dataset

load('data/hw2_data.RData')
load('data/coordinates.RData')

The dataset we used for the sake of this project is ABIDE (Autism Brain Imaging Data Exchange); a collection of brain imaging data and clinical information from individuals with autism spectrum disorder (ASD) and typically developing (TD) controls.

The dataset consists a of structural and functional fMRI data from over 12 individuals with ASD and 12 TD controls.

For each subject, we have \(n =145\) fMRI results for each of the \(D = 116\) cortical regions.

Brief explanation (Without being an expert) of what ABIDE dataset contains:

The eight cortical regions in the ABIDE dataset are (in parenthesis you find the begin number of the brain regions):

Reference of ABIDE where we got the above information.

For the plots we have used the coordinates of the 116 brain region from the aal116coordinates package.

Pre-processed

Since data were taken in different labs, it’s worth seeing and removing the lab-specific effect. For instance, we can have a look at the distribution of fMRI at the same cortical region for two difference subjects in the same group but taken in a different laboratory.

## Error : The fig.showtext code chunk option must be TRUE
## Error : The fig.showtext code chunk option must be TRUE

To remove the subject-specific effect we subtract the 116 time series arithmetic means and divide by their corresponding across time standard deviations.

scale_datasets_list <- function(ls){
  scaled_list <- list()
  names <- names(ls)
  for(i in 1:length(ls)){
    
    #### Scaling by column.
    patient <- data.frame(apply(ls[[i]], 2, scale))
    colnames(patient) <- colnames(ls[[i]])
    scaled_list[[names[i]]] <-patient
  }
  return(scaled_list)
}

td_sel_scale <- scale_datasets_list(td_sel)
asd_sel_scale <- scale_datasets_list(asd_sel)

Pool Together

To achieve the goal of enhance the separation between ASD and TD subjects, we pool the data within each group, by taking a statistical summary across subject on a per-time and per ROI data. For the continuation we take median ( Index insensitive to changes in scale that the measurements can have) as the statistical summary to have more robustness in the analysis.

We assumed within group double independence (across time and subjects), we will not consider the time dependency and so we treat the data as \(i.i.d.\).

Therefore we will study correlation between medians for both the two groups.

#### Mean by cell given list of datasets
cells_value_array <- function(ls, i, j){
  cells_value <- c()
  for(patient in ls){
    cells_value <- c(cells_value, patient[i,j])
  }
  return(cells_value)
}


#### Choose metric =c('mean', 'median', 'sd')
summary_dataset <- function(ls, metric='mean'){
  if(metric=='mean'){fun <- mean}
  if(metric=='median'){fun <- median}
  if(metric=='sd'){fun <- sd}
  
  n <- nrow(ls[[1]])
  m <- ncol(ls[[1]])
  mean_data <- matrix(rep(NA,n*m), n, m)
  for(i in 1:n){
    for(j in 1:m){
      mean_data[i,j] <- fun(cells_value_array(ls, i, j))
    }
  }
  data_frame <-data.frame(mean_data)
  colnames(data_frame) <- colnames(ls[[1]])
  return(data_frame)
}


TD <- summary_dataset(td_sel_scale, metric='median')
ASD <- summary_dataset(asd_sel_scale, metric='median')

As a consequence of this approach, we end up with two \(116 \times 116\) matrices, that represent the correlations between all the cortical regions for the two groups.

Correlations might be used to examine the relationship between brain fMRI data. By calculating correlations between these regions, we can determine whether there is a relationship and how strong it is. It is a way of understanding how different regions of the brain are connected and how they influence each other’s activity. If two brain regions consistently show strong positive correlations, it means that their activity tends to increase or decrease together, it might suggest that they are strongly connected and influence each other’s activity.

First, we can have a look at the empirical distribution of the Pearson correlation coefficients within the groups. The distribution of correlations can provide insights into brain function and can be helpful to understand how different brain regions contribute to various cognitive processes.

## Error : The fig.showtext code chunk option must be TRUE
## Error : The fig.showtext code chunk option must be TRUE

For evaluating dependency between cortical regions we have used the association graph.

For each pair {j,k} of cortical regions we tested at a confidence level set to \(\alpha = 0.05\) if they are uncorrelated:

\[\begin{cases} H_{0} :\hat{\rho_{j,k}} = 0 \\ H_{1} :\hat{\rho_{j,k}} \neq 0\\ \end{cases}\]

To test so, we compute for each pair {j,k} the confidence intervals, rejecting the null Hypothesis if the intervals does not intersect with the value 0.

Generically, having a threshold t , we can require a as we like strong correlation between the ares placing an edge between feature \(j\) and feature \(k\) whenever [−t, +t] ∩ Cj,k (α) = ∅ (the empty-set).

Choice of the threshold t

Since we care only about the strongness of the correlation between two areas, we consider the threshold t slicing over the absolute values of percentiles of the distribution of \(\rho\).

## Error : The fig.showtext code chunk option must be TRUE
## Error : The fig.showtext code chunk option must be TRUE

We will slide t over the percentiles from 0 to 0.7 showing the differences in the number of edges and discovering which edge will be more resilient.

Process to compute the Confidence intervals

To compute confidence intervals for a family of correlation coefficients, we used the Fisher’s Z transformation. This method involves converting the correlation coefficients to a new scale called z-scores.

\[\hat z_{j,k} = \frac{1}{2}log\big(\frac{1 + \hat\rho_{j,k}}{1 - \hat\rho_{j,k}}\big) = artanh(\hat\rho)\]

where it is asymptotically normally distributed as a Gaussian with mean \(z_{j,k}\) and standard error \(\frac{1}{\sqrt{N-3}}\)

Once converted the correlations coefficients to z-scores, we can use the normal distribution to build confidence intervals, and then convert lower and upper bound back to the original scale by using the inverse of z-transformation.

Since we are dealing with family wise test, in order to control the probability of making at least one error we can adjust the \(\alpha\) confidence level by using Bonferroni Correction. The confidence level would be divided by the number of correlations coefficient we are testing \(m = \binom{D}{2} = \frac{116 \times 115}{2} = 6.670\).

The Bonferroni adjustment is a conservative method, meaning that it tends to make error on the side of rejecting the null hypothesis. Since \(\alpha_{BON}= \frac{\alpha}{m} = 0,0007\) the confidence intervals results wider, more often intersecting the interval \([-t, t]\) . It results on the conclusion of don’t put edges between cortical regions where it should be.

lower_or_upper <- function(data, data2 = NULL , bound, cor_type='normal', bonferroni=TRUE, delta = F ){
  
  #### Setting Parameters
  n <- dim(data)[1]
  D <- dim(data)[2]
  alpha <- .05
  m <- choose(D, 2)   # Binomial coefficient
  
  #### Bonferroni Correction
  if(bonferroni == TRUE){ alpha <- alpha / m }
  if (delta == F){
  #### Use "Correlation" or "Partial Correlation"
  if(cor_type == 'normal'){
    g <- 0
    corr_matrix <- cor(data) }
  if(cor_type == 'partial'){
    g <- D-2
    corr_matrix <- pcor(data)$estimate
  }}
  if (delta == T){
    #### Use "Correlation" or "Partial Correlation"
    if(cor_type == 'normal'){
      g <- 0
      corr_matrix1 <- cor(data)
      corr_matrix2 <- cor(data2)
      corr_matrix <- corr_matrix1 - corr_matrix2}
    if(cor_type == 'partial'){
      g <- D-2
      corr_matrix1 <- pcor(data)$estimate
      corr_matrix2 <- pcor(data2)$estimate
      corr_matrix <- corr_matrix1 - corr_matrix2}}
  
  #### Computing Fisher Z-Transform
  Z_j_k_td <- (1/2)*log((1+corr_matrix)/(1-corr_matrix))
  
  #### Confidence intervals for theta
  se <- sqrt(1/( n - g - 3))
  Log_lower <- Z_j_k_td - qnorm(1 - (alpha/2)) * se
  Log_upper <- Z_j_k_td + qnorm(1 - (alpha/2)) * se
  
  #### Confidence intervals for rho
  Lower_bound <- (exp(2*Log_lower) - 1 ) /   ((exp(2*Log_lower) + 1))
  Upper_bound <- (exp(2*Log_upper) - 1 ) /   ((exp(2*Log_upper) + 1)) 
  
  #### Remove NA (on diagonal)
  Lower_bound[is.na(Lower_bound)] = 1
  Upper_bound[is.na(Upper_bound)] = 1
  
  if(bound == 'L'){ 
    Log_lower <- Z_j_k_td - qnorm(1 - (alpha/2)) * se
    
    #### Confidence intervals for rho
    Lower_bound <- (exp(2*Log_lower) - 1 ) /   ((exp(2*Log_lower) + 1))
    
    
    #### Remove NA (on diagonal)
    Lower_bound[is.na(Lower_bound)] = 1
    
    return(Lower_bound) }
  if(bound == 'U'){ 
    
    Log_upper <- Z_j_k_td + qnorm(1 - (alpha/2)) * se
    
    #### Confidence intervals for rho
    Upper_bound <- (exp(2*Log_upper) - 1 ) /   ((exp(2*Log_upper) + 1)) 
    
    #### Remove NA (on diagonal)
    Upper_bound[is.na(Upper_bound)] = 1
    return(Upper_bound) }
}

By increasing the value of the threshold t we basically require a strongest correlation between the two areas of the brain.

We set the adjacency matrix as a (D x D) matrix where there is 1 if for the pair {j , k} the confidence interval intersect with the interval [-t ; t].

adj_matrix_func <- function(mat , t, bonf=T, cor_type='normal', delta = NULL , data2 = NULL){
  L <-  lower_or_upper(mat , "L", bonferroni= bonf, cor_type=cor_type , delta = delta, data2 = data2 )
  U <-  lower_or_upper(mat , "U", bonferroni= bonf, cor_type=cor_type , delta = delta, data2 = data2)
  adj <- as.matrix(L > t | U < -t)
  return(adj)
}

The graph is represented by G = (V, E) where V = {V1, . . . , VD} is the vertex-set and E the edge-set.The edge-set E is as a (D × D) adjacency matrix E where E(j, k) = 1 if there is an edge between regions j and regions k and 0 otherwise.

After all we can visualize the association graph for the two groups slicing the value of threshold t in order to see co-activation differences.

Pearson correlation with Bonferroni adjustment Pearson correlation without Bonferroni adjustment

Explanation of the plots

The left plot describe the association graph built using Bonferroni correction and the one on the right is built without.

The plots represents the two association graph one for each group where:

  • each node represents a cortical region of the brain, and the color of the nodes identifies one of the eight main areas (previously explained ) it belongs.

  • each edge represents a strongest than the set threshold t linear correlation between the two areas, the color of the edge has the following meaning:

    • black: if both the two graphs have the edge in common.

    • blue: if only the TD graph has the edge.

    • red : if only the ASD graph has the edge.

We can reformulate the problem looking at the difference graph \(G^{\Delta}(t)\) where we put an edge if the asymptotic confidence interval of \(|\rho_{j,k}^{ASD} - \rho_{j,k}^{TD}|\) do not intersect with the interval [-t;t].

Difference graph with Bonferroni adjustment Difference graph without Bonferroni adjustment

Statistical properties:

All the plots are built using the Person’s correlation coefficient as the association measure.

The first pair of plots are built according to the Bonferroni family-wise adjustment while the second one is built skipping the connection. As we could expect the unadjusted one ( since the test is based on wiser confidence intervals ) has definitely more edges than the other.

The difference graph is built using as the association measure the absolute value of the difference between Persons’s correlation coefficient for the two groups. It means that there is an edge if the linear relationship between two areas is strong for TD subject and weak for ASD ones and viceversa and if the Persons’s coefficients are of opposite sign with a algebraic sum grater than t.

In the following graph we show the confidence intervals for Pearson’s coefficient of ASD subject with and without Bonferroni.

## Error : The fig.showtext code chunk option must be TRUE

Analysis

Following the result of what happens to vary the threshold t from a small to a large value of the percentile of the correlation coefficient:

Besides a lot of connections are effectively between areas corresponding to different regions. Generically differences between the two groups are in the structure of the connections, meaning that there is no specific brain problem that is associated with autism. The main result is the differences in the association structure of a certain region as the Frontal lobe, which is involved in problem-solving and social processing. However, the brain changes associated with autism can vary from people and a sample of twelve people per group may not catch all the actual differences that exist. The aim of the study is to suggest possible specific problems related to autism and to do more focused research to fully understand the underlying brain abnormalities in this disorder.

Analysis using partial correlation

Partial correlation is a measure of the linear association between two variables while controlling for the effects of one or more others.

Partial correlation is a measure of the strength and direction of the relationship between two variables, taking into account the effects of one or more other variables. It is useful for understanding the relationship between two variables when there are other variables that might be influencing that relationship.

To compute confidence intervals for a family of correlation coefficients, we still used the Fisher’s Z transformation.

\[\hat z_{j,k} = \frac{1}{2}log\big(\frac{1 + \hat\rho_{j,k}^{(P)}}{1 - \hat\rho_{j,k}^{(P)}}\big) = artanh(\hat\rho_{j,k}^{(P)})\]

where it is asymptotically normally distributed as a Gaussian with mean \(z_{j,k}\) and standard deviation \(\frac{1}{\sqrt{N-g-3}}\) where : \(g = (D-2)\).

Since the variance it’s greater, the final confidence intervals will be wiser than the ones built using Pearson’s correlation.

## Error : The fig.showtext code chunk option must be TRUE
## Error : The fig.showtext code chunk option must be TRUE

As we can see using Bonferroni correction the confidence intervals for \(\rho_{j,k}^{(P)}\) always intersect with zero, meaning that, it will not be any edge in the association graph.

We can highlight the structural difference in the association graph, built without using Bonferroni correction. It means that the actual difference between brain’s connection for the two groups is not in a specif area but instead on the structure of the connections.

Partial correlation with Bonferroni adjustment Partial correlation without Bonferroni adjustment

This is completely justified by the below plots, where they shown the difference graph, with and without Bonferroni correction.

Difference graph with Bonferroni adjustment Difference graph without Bonferroni adjustment

Conclusion

In conclusion, this study have found that subjects with autism tend to have differences in the way that different brain regions are connected and communicate with each other. These differences may be related to the problem solving and executive function. However, the brain changes associated with autism can vary from person to person and a sample of twelve people per group may not catch all the actual differences that exist.